New estimates of genome size in Orthoptera and their evolutionary implications

Animal genomes vary widely in size, and much of their architecture and content remains poorly understood. Even among related groups, such as orders of insects, genomes may vary in size by orders of magnitude–for reasons unknown. The largest known insect genomes were repeatedly found in Orthoptera, e.g., Podisma pedestris (1C = 16.93 pg), Stethophyma grossum (1C = 18.48 pg) and Bryodemella holdereri (1C = 18.64 pg). While all these species belong to the suborder of Caelifera, the ensiferan Deracantha onos (1C = 19.60 pg) was recently found to have the largest genome. Here, we present new genome size estimates of 50 further species of Ensifera (superfamilies Gryllidea, Tettigoniidea) and Caelifera (Acrididae, Tetrigidae) based on flow cytometric measurements. We found that Bryodemella tuberculata (Caelifera: Acrididae) has the so far largest measured genome of all insects with 1C = 21.96 pg (21.48 gBp). Species of Orthoptera with 2n = 16 and 2n = 22 chromosomes have significantly larger genomes than species with other chromosome counts. Gryllidea genomes vary between 1C = 0.95 and 2.88 pg, and Tetrigidae between 1C = 2.18 and 2.41, while the genomes of all other studied Orthoptera range in size from 1C = 1.37 to 21.96 pg. Reconstructing ancestral genome sizes based on a phylogenetic tree of mitochondrial genomic data, we found genome size values of >15.84 pg only for the nodes of Bryodemella holdereri / B. tuberculata and Chrysochraon dispar / Euthystira brachyptera. The predicted values of ancestral genome sizes are 6.19 pg for Orthoptera, 5.37 pg for Ensifera, and 7.28 pg for Caelifera. The reasons for the large genomes in Orthoptera remain largely unknown, but a duplication or polyploidization seems unlikely as chromosome numbers do not differ much. Sequence-based genomic studies may shed light on the underlying evolutionary mechanisms.


Introduction
Despite the enormous advances in sequencing technology, much of the structures and functions of genomes remain poorly understood. One of these is the 'C-value enigma' or 'C-value paradox' [1], which relates to the issue that different species have highly variable contents of non-coding DNA despite similar amounts of coding DNA. Large amounts of non-coding DNA and, consequently, large genomes pose problems to genomic sequencing and genome assembly. Even genetic studies based on single-read sequencing (i.e., Sanger) may become complicated due to the high prevalence of paralogs [2]. Knowledge of at least the rough size of its genome is therefore a prerequisite for genomic studies on any organism. Unfortunately, the genome sizes of just relatively few species are known. The Animal Genome Size Database [3] holds records of 6,222 species as of 30 June 2022, representing less than 0.37% of the know 1.7 million species. Out of the more than 1 million described species of insects, the most diverse class of organisms, only 1,164 species have a total of 1,345 recordsuppls in the Animal Genome Size Database. While some species with small genome sizes are well-known as model organisms, e.g., Drosophila melanogaster with 1C = 0.18 pg [4] and Tenebrio molitor with 1C = 0.52 pg [5], many have much larger genomes. One order with exceptionally large genomes is Orthoptera.
For several years, grasshoppers of the family Acrididae have held the records for the largest insect genomes. These were Podisma pedestris (1C = 16.93 pg; [6]), Bryodemella holdereri (1C = 18.64 pg; [7]) and Stethophyma grossum (1C = 18.48 pg; [8]). Satellite DNAs and transposable elements have been suggested as potential explanations for the large sizes [9,10]. Complete genome duplications may be less likely, as there is a lack of correlation of chromosome number and genome size. Despite their mostly higher chromosome numbers, ensiferans typically have smaller genomes than caeliferans [11]. Remarkably in this context, the most recent record holder for genome size in Orthoptera is the ensiferan Deracantha onos (1C = 19.60 pg; [12]). These studies, as reviewed by Gregory [3], show that genome sizes vary widely in grasshoppers and probably also all other groups of Orthoptera, warranting further investigation.
To obtain a better understanding of genome size variation in Orthoptera and its underlying evolutionary mechanisms, we generated estimates of genome size for 50 species of Orthoptera and used a mitogenomic phylogeny to track genome size across the evolution of the group. The main goals of this study were: 1) To provide measurements of the genome size of further species of Orthoptera and thus improve our knowledge on the range and variation of this character. 2) To track the evolution of genome size along the phylogenetic tree of Orthoptera. 3) To compare genome size data with chromosome numbers and, in the light of the XX/X0 sex determination system, discuss their implications for future studies.

Sampling
We collected specimens at eight sites across Germany and one in Austria: Meadows around Motzen, Brandenburg, 52 11.7892. We furthermore obtained specimens of some species that do not naturally occur in Germany from the pet trade. The identification of freshly collected species was based on morphological and bioacoustic characters [13,14]. We follow the nomenclature of Cigliano et al. [15]. Voucher specimens were deposited in the Zoological Museum Hamburg (ZMH), part of the Leibniz Institute for the Analysis of Biodiversity Change (LIB) under the accession ZMH 2019/21. A detailed list of all specimens and samples with individual accession numbers is given in S1 Table. The Government of Upper Bavaria authorized handling, capture, and killing of the insect specimens used in this study with a permit issued on 15 July 2019. Insects were anesthetized and euthanized using CO2.

Genome size measurements
We measured the nuclear DNA content (2C) of samples using the flow cytometry method (FCM) as described in Sadilek et al. [16,17] (see also [8]). For every sample, we extracted the muscle tissue of one hind femur and homogenized it with a leaf of the internal plant standard Pisum sativum L. "Ctirad" (Fabaceae) with 2C = 9.09 pg [18,19] in 500 μl of Otto buffer I at 4˚C. A plant standard was used to ensure comparability with previously conducted measurements of the same facility (8). We then filtered the cell suspension through a 42 μm nylon mesh and split it in two halves. One half was stained with 1,000 μl DAPI solution (1 ml of Otto II buffer (0.4 M Na2HPO4 � 12 H2O) supplemented by AT-selective fluorescent dye DAPI (4',6-diamino-2-phenylindol) and 2-mercaptoethanol in final concentrations of 4 μg/ml and 2μl/ml, respectively) and the second half with 1,000 μl propidium iodide (PI) solution (1 ml of Otto II buffer (0.4 M Na2HPO4 � 12 H2O supplemented by intercalating fluorescent dye PI, RNase and 2-mercaptoethanol in final concentrations of 50 μg/ml, 50 μg/ml and 2μl/ml, respectively) for several minutes. We conducted the analysis of the DAPI-stained sample in a Partec CyFlow instrument with an UV LED chip and the analysis of the PI-stained sample in a Partec SL instrument with a green solid-state laser (Cobolt Samba, 532 nm, 100 mW; Partec GmbH, Münster, Germany). 3,500 to 5,000 particles were recorded in each FCM analysis. We analyzed the output data with the Partec FloMax v. 2.52 software (Partec GmbH, Münster, Germany). Sample genome size was calculated as ratio of known standard genome size to measured peak. Median coefficients of variation are 2.31 for DAPI and 3.81 for PI. All measurements and analyses were conducted at the Institute of Botany of Czech Academy of Sciences, Prague.
Combined DAPI and PI measurement results of the same specimen express the AT/GC ratio of the genome of the species, the GC content (e.g.: [16,17,20]). The GC content of P. sativum is 38.50% [21] and the GC content of the analyzed samples was calculated with a Microsoft Excel macro [20]. Measurements in pg were converted to base pairs � 10 9 (Gbp) using the formula 1 pg = 0.978 Gbp [22].

Analyses of genome size data
We assembled a dataset of newly measured species and Orthoptera genome size measurements from previous studies, based on the Animal Genome Size Database [3] complemented with further recent studies. We then plotted the male genome sizes against the number of chromosomes [11] for all species for which both values were available; we used male genome size because more male measurements are available from the literature combined with our new data. We tested for statistical significance using a Kruskal-Wallis test and pairwise Mann-Whitney tests (Bonferroni corrected) of all chromosome numbers with more than one record in PAST 4.03 [23]. Finally, we used our data to calculate the difference between mean male and female genome size for each species where both sexes were available and tested for correlation of size differences between sexes and male genome size with Pearson's r.
We also tested for correlation between genome size and GC content among our new measurements, independently for female and male specimens. We then checked if GC content was generally different between females and males using a t-test. As this test yielded non-significant results (see Results), we used an ANOVA and pairwise Mann-Whitney tests (Bonferroni corrected) to test for differences in GC content between families.

The evolution of genome size in Orthoptera
In order to track the development of genome size along the evolutionary history of Orthoptera, we plotted the known genome sizes on a phylogenetic tree. We assembled a dataset of complete and partial mitochondrial genomes for tree reconstruction with our new measurements combined with data from GenBank and BOLD [24] (S1 Table). Out of the 146 species with known genome sizes, we found complete mitochondrial genomes available for 86 individuals belonging to 70 species. Under the rationale that all species included in the dataset should at least be represented in the mitochondrial genes COI, CytB, COII or ND5, we added 49 further species for which at least one of these additional mitochondrial markers was available. We aligned the dataset using MUSCLE [25] integrated in Geneious v.10.0.9 [26] and KALIGN [27]. Since many regions of the mitogenome were represented in only relatively few specimens, we reduced the dataset to the genes Cytochrome C Oxidase I and II, Cytochrome B, and NADH Dehydrogenase 5.
We then reconstructed a Maximum Likelihood tree using the IQtree web server [28,29] with automatic substitution model selection, 1,000 Bootstrap alignments, and 1,000 iterations under a minimum correlation coefficient of 0.99, treating all mitochondrial genes as one partition. The single branch test was performed after 1,000 replicates, a perturbation strength of 0.5, and an internal IQ-Tree stopping rule of 100. Based on this tree, we reconstructed ancestral states of genome size in the R v.3.6.3. environment [30] using the packages 'phytools' [31] and 'ape' [32]. The ape package implements non-parametric rate smoothing and does not require ultrametrization of the tree. The R code is available under https://github.com/laradey/ Genome_size_Orthoptera.git. In species for which more than one measurement was available, we used the mean value.

Evolutionary analysis
The phylogenetic tree based on mitochondrial genomes generated with IQtree is given as partial trees of Ensifera (Fig 2) and Caelifera (Fig 3) (see also S1 File for the complete tree). The tree is overall well supported with bootstrap values of >95. Caelifera and Ensifera were found monophyletic, as are Gryllidea and Tettigoniidea within Ensifera. Caelifera is only represented by Acrididea, since no members of Tridactylidea were included. The genus Meconema is found as the sister group to all other Tettigoniidea, making Tettigoniidae paraphyletic with respect to all other families of this infraorder. All subfamilies of Tettigoniidae included are retrieved as monophyletic. The only genera of this group found paraphyletic are Ruspolia (with respect to Neoconocephalus) and Metrioptera (with respect to Bicolorana). Within Caelifera, both Tetrigidae and Acrididae, as well as all subfamilies of Acrididae included by us, were Plotting genome sizes on the tree shows consistent values of mean genome size in different clades of the Orthoptera tree (Fig 4)  The ancestral state reconstruction found genome size values of >15.84 pg only for the nodes of Bryodemella holdereri / B. tuberculata and Chrysochraon dispar / Euthystira brachyptera (S1 File). The predicted genome size values are 1C = 6.19 pg for Orthoptera, 5.37 pg for Ensifera, and 7.28 pg for Caelifera.

Discussion
The diversity in genome size is an important parameter in organismal diversity research, yet it remains poorly studied. This is particularly true for insects, where genome size is known for less than 0.5% of the described species [3]. So far, the largest genomes of all insects have been detected among members of Orthoptera, exceeding the human genome by the factor seven. Nevertheless, data is available for only a small selection of species. The variation in measurement methods and standards also somewhat limits the comparability of genome sizes from different studies and databases. However, the commonly observed congruence between values from different sources (S1 File) suggests to us that joint analyses are valid. We contributed measurement of 50 species, 38 measured for the first time, and added these to the known set of species. The variation in genome sizes was even larger than expected but does not follow a clear pattern.

Genome size variation in Orthoptera
The newly measured size of 1C = 21.96 pg of the genome Bryodemella tuberculata surpasses the previous records held by 1C = 19.60 pg in Deracantha onos [12], 1C = 18.48 pg in B. holdereri [7], 1C = 18.48 pg in Stethophyma grossum [8], and 1C = 16.93 pg in Podisma pedestris [6]. We are not aware of any insect with a larger genome published in the meantime. Therefore, B. tuberculata holds the record for the largest known insect genome size. The genome size of Chrysochraon dispar (1C (female) = 19.43) also surpasses that of all previously known genomes of Caelifera. Furthermore, the size of the largest genome measured of Stethophyma grossum slightly exceeds the value Husemann et al. [8] found in that species. This intraspecific or measurement variation is within the ranges detected in other similar studies [7,12].
Schielzeth et al. [33] provided a measurement of the genome size of Chorthippus biguttulus (Acrididae) of 1C = 236.05 pg, which would exceed our measurement of B. tuberculata by an order of magnitude. We measured the genome size of Ch. biguttulus as 1C = 10.99 pg, which is in line with the measurements of Shah et al. [9] (1C = 9.31 pg) and Husemann et al. [8]   (1C = 11.31 pg). This suggests that, as commented by Camacho [34], the measurement of Schielzeth et al. [33] was indeed unreliable and does not represent a true value. Within Acrididae, the largest genomes belong to representatives of the subfamilies Oedipodinae (maximum: Bryodemella tuberculata, 2n = 22+XX, 1C = 21.96 pg), Gomphocerinae (Chrysochraon dispar, 2n = 16+XX, 1C = 19.43 pg), and Melanoplinae (Podisma pedestris, 2n = 22+X0, 1C = 16.93 pg). Fig 1 shows that species with male chromosome counts of 2n = 16 +X0, followed by 2n = 22+X0, have the largest genomes [11]. All these species belong to the family Acrididae. The representatives of other families of Caelifera, Tetrigidae, and Morabidae have far smaller recorded genomes. Overall, the genomes measured in Orthoptera so far span a large size range from less than 1 GB in some crickets to more than 20 GB as measured here for Oedipodinae. This suggests complex evolutionary processes underlying the evolution of genomes in Orthoptera, which will have to be explored in the future. So far, few Orthoptera genomes have been sequenced [35] owing to their large size, but comparative genomic analyses across genomes of different sizes will be necessary to understand the genome gigantism in this group.
Our study adds to other recent works [7,8,12], in providing new records of the largest genome size in Orthoptera and at the same time in all insects by studying just a comparatively limited number of species. We consider it very likely that future studies will discover even larger genomes in Orthoptera or among members of another insect order.

The evolution of genome size
In order to study the evolution of genome size in Orthoptera, we plotted the known measurements on a phylogenetic tree based on mitochondrial data. We selected mitochondrial data because it was available for a large number of species included in our genome size dataset (110 out of 146 = 75.3%). The tree obtained by Maximum Likelihood reconstruction is largely congruent with other trees available for Orthoptera so far [36][37][38][39][40]. However, it does not resolve the positions of Gryllotalpidae (Gryllotalpa), Mogoplistidae (Ornebius), and Trigoniidae (Nemobius), which have been placed as hierarchical sister groups to Gryllidae in other studies [40,41]. We found Tettigoniidae paraphyletic with respect to a clade consisting of Anostostomatidae (Hemideina) + Rhaphidophoridae (Ceuthophilus + Hadenoecus) that is placed as sister group to Meconematinae, albeit with poor support of 70% Bootstrap (S1 File). Note that Hemideina measurements were generated with the DAPI stain, which is influenced by AT/GC ratio in the genome and therefore considered less accurate than the PI stain used in all other FCM values listed here. Acrida was retrieved as the sister group to all other Acrididae, which contrasts with previous hypotheses [37,40].  The reconstructed paraphyly of genera Ruspolia and Metrioptera most likely reflects the urgent need of revising their taxonomy. The genus Chorthippus is notoriously complicated, and a revision will have to be based on more comprehensive data, as mitochondrial datasets have been shown to yield phylogenetic results incongruent to those based on larger genomic datasets [42].

PLOS ONE
Our phylogenetic tree also shows that exceptionally large genome sizes (more than 1C = 16 pg) are attained only in isolated clades. This result is certainly restricted by our dataset, which includes only few representatives of many clades. Nevertheless, it shows that large genome sizes are characteristic of only single genera (Bryodemella, Deracantha, Stethophyma) or closely related genera, e.g.. Chrysochraon + Euthystira. Husemann et al. [43] estimated the split of Bryodemella from Sphingonotus to about 31.8 million years ago (ma) [38]. The split of Euthystira from Euchorthippus was estimated to 12.88 [15.69-9.96] ma by Hawlitschek et al. [42]. There are no estimates for the split of Stethophyma from Epacromius + Aiolopus. Estimating the age of the lineage of Deracantha is difficult due to its contradictory placing in phylogenetic trees by Mugleston et al. [36] and Yuan et al. [12], but lineages in Ensifera are generally older than in Caelifera. The splits from related clades with smaller genome sizes (e.g., Phaneroptera) have been dated to around 100 ma in these studies. Based on these estimates, some large genomes may be of comparatively old evolutionary age. However, the increase in genome size is not necessarily related to the splitting of lineages we were able to detect. Duplications of chromosomes may have played a role for speciation with subsequent merging of chromosomes to the original number, but due to lack of evidence this remains speculation. Much finer phylogenetic resolution at the genus and species level will be required to track the evolution of genome size in individual clades more reliably.

The relationship of genome size with life history and cytogenetic traits
No previous studies have been able to answer the question as to why some species of grasshoppers have such large genomes. Some of the earlier record keepers, Podisma pedestris and Stauroderus scalaris, as well as Bryodemella tuberculata, are species of montane habitats in Central Europe today. However, this does not hold true for their current global and historical European distributions [15]. Our sampling for this study was restricted to central Europe, but more species of other regions need to be studied to detect possible correlations between genome size and ecological or geographic variables.
Hypotheses on a correlation between life history traits and genome size have also been raised, e.g., body size and the ability to fly [44][45][46]. Larger body size was hypothesized to correlate with larger genomes, whereas the genomes of flying species were hypothesized to be smaller than those of flightless species. Yuan et al. [12] tested for any correlation of these traits with genome size in their dataset of tettigoniid ensiferans but found none. We did not test this in our dataset because it covers a wide phylogenetic range at rather coarse taxonomic resolution, and a finer scale will probably be necessary to detect any such correlation. Caeliferan species with particularly large genomes, such as Chrysochraon dispar and Podisma pedestris are flightless (with the exception of rare long-winged individuals), whereas Bryodemella tuberculata and Stethophyma grossum are good fliers [15]. Among the Ensifera, Deracantha onos is large-bodied and flightless, as is Hemideina crassidens, whose genome is substantially smaller (1C = 5.7 pg). No other large flightless ensiferan genome has been analyzed, which makes the search for correlation between these traits and genome sized difficult.
Our dataset includes two cave-dwelling species, Ceuthophilus stygius and Hadenoecus subterraneus (both Gryllidea-Rhaphidophoridae-Ceuthophilinae). While both species have very similar chromosome counts (36+X0 vs. 34+X0), the genome of C. stygius is almost the five-fold size of that of H. subterraneus (9.55 pg vs. 1.55 pg). It is therefore difficult to speculate if the adaptation to the cave environment has any consequences for genome size. Notably, Gryllidea are overall very heterogeneous regarding genome size (0.95 pg to 9.55 pg) and chromosome count (10+X0 to 36+X0).
Other than life history and ecology, genome size has been hypothesized to correlate with traits of cytogenetics and genome architecture. Several studies found large genomes, including those of orthopterans, rich in satellite DNA, long terminal repeats and transposons, including helitrons and mariner like elements [9,10,[47][48][49]. Whole genome duplications are another presumably common reason for large genome size (e.g., in fish [50]), going along with a polyploidization and a large number of chromosomes. However, such a relation was not found in Orthoptera. Intuitively, taxa with more chromosomes might be expected to also have larger genome sizes. Our analysis does not suggest any positive correlation of chromosome number and genome size. Some taxa with especially small numbers of chromosomes, such as European Gomphocerinae, have some of the largest genomes [8]. This suggests that the chromosome number reduction is associated with fusions rather than the actual loss of chromosomes.
On the other hand, we found, despite an overall rather narrow range of GC content, a correlation between genome size and GC content. Larger genomes had a generally higher GC content. There was no indication of difference in GC content between sexes, but the families studied here differed significantly. Acrididae and Tettigoniidae (with large genomes) were found to have genomes with higher GC content compared to Tetrigidae and Gryllidae (with small genomes). The general implications of GC content on animal genomes are not well studied. Low GC content may be an indicator of the presence of bacterial endosymbionts [51], whereas high GC content may be a sign of low chromatin condensation [52]. How these phenomena affect insects has not been studied [53].
As the majority of Orthoptera investigated, most species included in our study follow an XX/X0 sex determination system, implying that female genomes should be larger than male genomes just due to the second copy of the sex chromosome X. The same can be assumed for species with XX/XY (here in Oecanthus), as neo-Y chromosomes should be smaller than X chromosomes [54]. We find this reflected in the difference between female and male genomes of most species. However, the differences are minuscule in some species and even inverted (with male genomes larger than female genomes) in Chorthippus dorsatus and Myrmeleotettix maculatus (both Acrididae). In the case of M. maculatus, the inversion can most likely be attributed to different methods used to measure genome size (Feulgen densitometry vs. Flow cytometry) of males and females in different studies. Conversely, the specimens of Ch. dorsatus were from the same locality and measured in the same workflow, suggesting real intraspecific variability. The presence of B chromosomes might offer an explanation for the larger genome size of males than females, but no such phenomena have been reported for this species [55,56] and chromosomes of specimens were not analyzed in the present study. Conclusions drawn on intraspecific difference in genome size will have to be backed by much larger sample sizes, but we uphold that the comparison of our present genome size measurements with the same species reported by previous studies show sufficient overall congruence to allow for interspecific comparisons and the tracking of genome size evolution.
Finally, genomic sequence data will be necessary to investigate the reasons behind the huge genomes of Orthoptera. Currently, better transcriptome and genome assemblies are on the way which may help to better understand the reasons for the large sizes of Orthoptera genomes [35,57].
Supporting information S1 Table. Complete table of all genome size measurements of Orthoptera reviewed for this study.